Hierarchical Bayesian Language Modelling for the Linguistically Informed

نویسنده

  • Jan A. Botha
چکیده

In this work I address the challenge of augmenting n-gram language models according to prior linguistic intuitions. I argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In an empirical evaluation, the model outperforms the Kneser-Ney model in terms of perplexity, and achieves preliminary improvements in English-German translation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Patient Safety and Healthcare Quality: The Case for Language Access

This paper aims to provide a description of the need for Culturally and Linguistically Appropriate Services (CLAS) for Limited English Proficient (LEP) patients, an identification of how the lack of CLAS for LEP patients can compromise patient safety and healthcare quality, and discuss barriers to the provision of CLAS.

متن کامل

Extending Phrase-Based Decoding with a Dependency-Based Reordering Model

Phrase-based decoding is conceptually simple and straightforward to implement, at the cost of drastically oversimplified reordering models. Syntactically aware models make it possible to capture linguistically relevant relationships in order to improve word order, but they can be more complex to implement and optimise. In this paper, we explore a new middle ground between phrase-based and synta...

متن کامل

LAMP - TR - 152 CS - TR - 4947 UMIACS - TR - 2009 - 15 November 2009 Extending Phrase - Based Decoding with a Dependency - Based Reordering Model

Phrase-based decoding is conceptually simple and straightforward to implement, at the cost of drastically oversimplified reordering models. Syntactically aware models make it possible to capture linguistically relevant relationships in order to improve word order, but they can be more complex to implement and optimise. In this paper, we explore a new middle ground between phrase-based and synta...

متن کامل

Bayesian Hierarchical Modelling for Tailoring Metric Thresholds

Software is highly contextual. While there are cross-cutting ‘global’ lessons, individual software projects exhibit many ‘local’ properties. This data heterogeneity makes drawing local conclusions from global data dangerous. A key research challenge is to construct locally accurate prediction models that are informed by global characteristics and data volumes. Previous work has tackled this pro...

متن کامل

A Model for Tax Evasion Forcasting based on ID3 Algorithm and Bayesian Network

Nowadays, knowledge is a valuable and strategic source as well as an asset for evaluation and forecasting. Presenting these strategies in discovering corporate tax evasion has become an important topic today and various solutions have been proposed. In the past, various approaches to identify tax evasion and the like have been presented, but these methods have not been very accurate and the ove...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012